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Abstract 

Rice is the world's most important staple grown by millions of small-holder farmers. Sustaining rice production relies 
on the intelligent use of rice diversity. The 3,000 Rice Genomes Project is a giga-dataset of publically available genome 
sequences (averaging 14x depth of coverage) derived from 3,000 accessions of rice with global representation of 
genetic and functional diversity. The seed of these accessions is available from the International Rice Genebank 
Collection. Together, they are an unprecedented resource for advancing rice science and breeding technology. 
Our immediate challenge now is to comprehensively and systematically mine this dataset to link genotypic 
variation to functional variation with the ultimate goal of creating new and sustainable rice varieties that can 
support a future world population that will approach 9.6 billion by 2050. 
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Background 

Rice (Oryza sativa L.) is the staple food for half the 
world population, particularly for the poorest of Asia. 
Rice played a major role in the Green Revolution in the 
1960s, a well-known story of modern plant breeding that 
contributed significantly to global food security. Now, 
rice production must increase by at least 25% by 2030 to 
keep pace with predicted population growth. This has to 
be achieved using less land, less water and under more 
severe environmental stresses expected, due to the effects 
of climate change and disease pressures. Much of this 
increase must come from genetic improvement of rice. 
Rice research has progressed greatly in the past years, 
highlighted by the completion of the first high-quality rice 
genome in 2005 [1], that stimulated global efforts in rice 
functional genomics research [2,3]. However, the infor- 
mation from rice genetics and genomics research has 
yet to fundamentally change rice breeding practices. 
The currently available sequence data are not yet in a 
form readily usable by most rice breeders nor is the 
global community yet prepared to manage data influxes 
that may well be orders of magnitude greater than previ- 
ously encountered. 



Exploring rice genetic diversity 

Rice is known for tremendous within-species and within- 
genus genetic diversity, both critical foundations for rice 
improvement. This rich source of genetic diversity is 
preserved in more than 230,000 germplasm accessions 
of Oryza maintained in genebanks worldwide, mostly of 
Asian origin. Exploring this diversity at the sequence 
level has been, until recently, only a dream of rice scientists. 
But two years ago, the Chinese Academy of Agricultural 
Sciences (CAAS), the Beijing Genomics Institute (BGI) 
Shenzhen and the International Rice Research Institute 
(IRRI), launched a program to systematically sequence a 
broad spectrum of known diversity across the species. 
"The 3,000 (3K) Rice Genomes Project" is a major step 
towards revealing the genomic diversity in all of the world's 
rice germplasm collections. But for this ambitious effort to 
be meaningful beyond the scientific community, significant 
investments will have to be made in measuring plant 
performance under a wide range of conditions, as well 
as the development of data management approaches 
that can apply the genetic knowledge to practical uses 
by extracting genotype-to-phenotype relationships for a 
better understanding of plant biology. 
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Current status and plans 

The 3K Rice Genomes Project has completed sequencing 
3,000 rice genomes with an average sequencing depth of 
14x. The panel of sequenced rice accessions represents 
a diverse set originating from 89 countries, and were 
selected from a combined collection of -13,000 O. sativa 
accessions from the ~ 180,000 rice accessions conserved in 
the International Rice Genebank Collection (IRGC) at 
IRRI [4] and the China National Crop Genebank (CNCG) 
[5]. The 3K lines included most rice mega-varieties being 
grown across large areas of different ecosystems through- 
out Asia [6] . The parental lines of popular varieties and se- 
lected genetic mapping populations were also included; 
400 of which were parental lines for genome-wide intro- 
gression lines for multiple complex traits developed using 
novel molecular breeding strategies [7]. While this approach 
should capture most of the genetic variation in rice, a 
further round of sampling based on insights obtained 
from the 3K project will be needed to capture unusual 
and possibly highly useful variants [8]. 

The sequence data for the 3K rice genomes, deposited 
in the GigaScience journal database, GigaDB [9], provides 
an unprecedented resource for rice scientists. Not only 
will this giga-dataset form the basis for advancing our 
understanding of rice's history of selection (natural or 
imposed), it provides the platform for large-scale discovery 
of genetic variation associated with important traits for 
breeding applications [7]. While scientists are thrilled with 
the potential impact of the project, many challenges re- 
main to integrate the sequence data with genomic, genetic 
and phenotypic data from many sources. A rice diversity 
information portal and underlying databases will be needed 
to facilitate large-scale gene/trait discovery and allele 
mining [10], enable the development of new molecular 
breeding strategies [7], and inform a strategy for better 
conserving rice genetic resources [11]. There are several 
discrete steps that are necessary in order for the outcomes 
of the 3K Rice Genomes Project to have practical 
applications: 

1) Decipher global and local population differentiation; 

2) Construct new high-quality reference genomes 
representing major varietal groups; 

3) Create haplotype maps by linkage disequilibrium 
and recombination break-point analyses; 

4) Build one or more pan-genome assemblies for each 
of the varietal groups and create annotation map- 
pings between pan-genomes; 

5) Discover single nucleotide polymorphisms (SNPs), 
structural variants and indels between and within 
populations. 

While the applications are clear and the opportunities 
nearly boundless, the analytical challenges are indeed 



enormous. Sequence information, while offering bountiful 
material for evolutionary studies, offers little on its own 
for practical applications for rice breeders. Plans are being 
developed with multiple institutions under the auspices 
of the Global Rice Science Partnership (GRiSP) of the 
Consultative Group on International Agricultural Research 
for extensive and systematic characterization of phenotypes 
of accessions for a wide range of traits to discover import- 
ant sequences and regions using genome-wide association 
studies. Phenotyping, coordinated by IRRI and CAAS, is 
in progress for biotic and abiotic stress tolerance, grain 
quality characters, plant development and yield traits. 
High-throughput phenomics using image and sensor cap- 
ture from controlled environment and field-based (ambient 
and managed) platforms will contribute immensely to the 
ability to associate sequence information with phenotypes. 
This combined effort will provide greater depth and cover- 
age compared to prior studies yielding deeper insights and 
broader applications. 

Even before phenotypic data becomes available, we 
expect that analyses of the 3K rice genomes data will yield 
useful information, and that greater sequencing depth 
or higher sampling will be guided by analysis of the 
population structure. And, as multiple high-quality ref- 
erence genomes are developed, many more SNPs in the 
pan-rice genome should be discovered. The emerging 
high-density maps will further facilitate efficient gene 
discovery and allele mining. Thus, an early outcome of the 
3K Rice Genomes Project will be new population-specific 
genotyping arrays useful to a wide range of genetic and 
breeding applications. Secondly, detailed studies should 
reveal population structures that have been shaped by 
evolutionary, domestication and selection processes. Iden- 
tification and detailed analyses of unique cryptic structural 
genomic variants across the rice genome will allow us to 
understand their contributions to the previously identified 
varietal groupings in rice. Thirdly, by including lines used 
in mapping and breeding programs, we can target gene 
validation for direct use for trait improvement in breeding 
populations. Breeding populations developed from the 
sequenced lines will enable implementation, testing and 
improvement of novel breeding strategies, such as gen- 
omic selection and recurrent selection in rice breeding 
programs [6]. 

Challenges 

Completion of the sequencing and preliminary analyses 
of 3K rice genomes is just the first step in establishing an 
information platform of integrated databases and advanced 
tools to accelerate rice breeding. This effort will be similar 
in scope to the development of the Arabidopsis Informa- 
tion Portal (AIP) [12]. IRRI has initiated the International 
Rice Informatics Consortium (IRIC) under GRiSP. While 
writing this paper, discussions are underway to formalize 
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the consortium agreement for IRIC and technical aspects 
of the portal design, standards for meta-data for interoper- 
ability, and persistent, diagnostic germplasm identifiers. 
First targets include curation of 3K rice genomes data and 
other public data, definition of reference genomes, design 
and archival of phenotyping datasets, and a web-based 
interface, or portal, and tools for population structure, 
genome-wide association studies and diversity browsing. 
Still, linking diversity in the 3K rice genomes dataset to 
phenotypic variation and environmental adaptation re- 
quires a long-term global effort in rice functional gen- 
omics research. For a more complete understanding of 
O. sativa genetic diversity and genes underlying important 
rice traits, future research should not only focus on identi- 
fying and characterizing rare genes/alleles with large effect, 
but also on novel allelic combinations underpinning com- 
plex traits. With such value-added information integrated 
into the database and access to appropriate tools through 
the Web portal, a more systematic discovery and enhanced 
utilization of rich genetic diversity will be possible [7,10,11]. 
While this project will undoubtedly stimulate another 
round of rapid advances in rice genetics, numerous 
challenges exist to extract the most information from 
the sequence and phenomics data to establish a global, 
public information platform useful not only for experi- 
mental research, but also for practical rice breeding. 
These challenges will be overcome through global rice 
research efforts to ensure scientific advancements and 
delivery of benefits for rice farmers and to maintain the 
food security of humankind. The challenge is large and 
will require unprecedented collaboration that transcends 
national, institutional and personal ambitions. 
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Endnote 

a "Mega-varieties" refers to those varieties that have been 
grown on at least 1 M hectares. 
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